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Abstract 

By fixing all parameters in a phylogenetic likelihood model except for one branch 
length, one obtains a one-dimensional likelihood function. In this work, we introduce a 
mathematical framework to characterize the shapes of such one-dimensional phyloge¬ 
netic likelihood functions. This framework is based on analyses of algebraic structures 
on the space of all frequency patterns with respect to a polynomial representation of 
the likelihood functions. Using this framework, we provide conditions under which 
the one-dimensional phylogenetic likelihood functions are guaranteed to have at most 
one stationary point, and this point is the maximum likelihood branch length. These 
conditions are satisfied by common simple models including all binary models, the 
Jukes-Cantor model and the Felsenstein 1981 model. 

We then prove that for the simplest model that does not satisfy our conditions, 
namely, the Kimura 2-parameter model, the one-dimensional likelihood functions may 
have multiple stationary points. As a proof of concept, we construct a non-degenerate 
example in which the phylogenetic likelihood function has two local maxima and a local 
minimum. To construct such examples, we derive a general method of constructing 
a tree and sequence data with a specified frequency pattern at the root. We then 
extend the result to prove that the space of all rescaled and translated one-dimensional 
phylogenetic likelihood functions under the Kimura 2-parameter model is dense in 
the space of all non-negative continuous functions on [0, oo) with finite limits. These 
results indicate that one-dimensional likelihood functions under advanced evolutionary 
models can be more complex than it is typically assumed by phylogenetic inference 
algorithms; however, these complexities can be effectively captured by the Kimura 
2-parameter model. 

Keywords evolutionary model, molecular evolution, phylogenetics, likelihood model, 
characteristic polynomial, algebraic representation, multimodality, universal model. 


1 Introduction 

The likelihood of a phylogenetic model is a function of the parameters of continuous time 
Markov chains (CTMCs) used to model sequence evolution along each branch. It is common 
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to assume a single rate matrix and stationary frequency for the CTMCs but allow the 
branch lengths to vary, representing a single evolutionary process but differing amounts 
of evolution along each branch. Commonly used maximum-likelihood phylogeny programs 
improve likelihood by modifying branch lengths iteratively and one at a time [1]. The general 
approach for numerical maximization of the one-dimensional likelihood function given by 
hxing every parameter except for one branch length is to iteratively sample the function at a 
number of points, use surrogate functions to £t simple curves to those points, and use those 
hts as approximations to locate the maximum branch length. For example, programs often 
employ Newton’s method, in which the intuitive idea is to use hrst and second derivatives 
to approximate the likelihood function (varying along that branch) by a surrogate quadratic 
function. Since evaluations of the likelihoods (and their derivatives) are computationally 
expensive, many approaches have been tried to improve the efficiency of this optimization 
procedure [1]. 

Such approaches, however, rely on the assumptions that one-dimensional phylogenetic 
likelihood functions belong to some class of simple functions, and that the surrogate model 
can, at least, capture the shape of the functions. While there has been a considerable 
amount of work on finding multiple maxima of the multi-dimensional likelihood surfaces 
parameterized by all branch lengths for a tree [2, 3, 4], little has been done about the shapes 
of one-dimensional phylogenetic likelihood functions. The only attempt to investigate the 
shape of the one-dimensional phylogenetic likelihood functions has been [5] , which provided 
a proof of uniqueness of the stationary points for one-dimensional phylogenetic likelihood 
functions in the case of the one parameter model of nucleotide substitution. Based on this 
proof, the authors of [5] asserted that there is at most one stationary point of the full 
likelihood surface. This claim was later disproved by [2], although the proof for the one¬ 
dimensional case still holds. However, the result has not been examined for the more complex 
models used in practice. 

In this work, we introduce a mathematical framework to characterize the shapes of such 
one-dimensional phylogenetic likelihood functions. This framework is based on analyses of 
algebraic structures on the space of all frequency patterns with respect to a polynomial 
representation of the likelihood functions. Specihcally, we introduce the new concept of 
logarithmic relative frequency patterns and analyze algebraic structures on the space of such 
patterns. These structures, along with the characteristic polynomial representations of one¬ 
dimensional phylogenetic likelihood functions, open a new way to explore the space of all 
possible likelihood functions. Moreover, by composing these structures, we are able to tackle 
the inverse problem of constructing a phylogenetic tree that has a given frequency pattern 
at the root. This enables us to construct phylogenetic trees that approximate any given 
likelihood function with arbitrary precision. 

Using this framework, we provide conditions under which the one-dimensional phyloge¬ 
netic likelihood functions are guaranteed to have at most one stationary point, and this point 
is the maximum of the one-dimensional function. These conditions are satished by common 
simple models including all binary models, the Jukes-Cantor model [6] and the Felsenstein 
1981 model [7]. We then prove that for the simplest model that does not satisfy our condi¬ 
tions, namely, the Kimura 2-parameter model [8], the one-dimensional likelihood functions 
may have multiple stationary points. As a proof of concept, we construct a non-degenerate 
example in which the phylogenetic likelihood function has two local maxima and a local 
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Figure 1: [Left:] A general Markov model of DNA evolution along a tree edge. [Right:] An 
extension a of the labelling 'ipi (corresponds to site i in the sequences) of the leaves of a 
simple tree r to its inner nodes. 


minimum. 

We then extend the result to prove that the space of all rescaled and translated one¬ 
dimensional phylogenetic likelihood functions under the Kimura 2-parameter model is dense 
in the set of all non-negative continuous functions on [0, oo) with hnite limits. These results 
indicates that one-dimensional likelihood functions under advanced evolutionary models can 
be more complex than it is typically assumed by phylogenetic inference algorithms; however, 
these complexities can be effectively captured by the Kimura 2-parameter model. 


2 Background and Definitions 

2.1 Markov models of sequence evolution 

Our setting is the standard IID setting for likelihood-based phylogenetics with a hnite number 
of sites; we review the basics here but refer the reader to [9] for more details. Let O denote 
the set of states and let r = |0|. For convenience, we assume that the states have indices 1 
to r. 

For an unrooted tree T with N taxa, we use E(T) and V(T) to denote the set of edges 
and vertices of T, respectively. On each edge e G E(T), we assume that the mutation events 
occur according to a continuous time Markov chain on states O with instantaneous rate 
matrix Q^. This rate matrix and the branch length on the edge e dehne the transition 
matrix P® = on edge e, where Pij{te) denotes the probability of mutating from state i 
to state j across the edge e (with length p)- 

We further assume that for all edges e G E{T), the Markov chains that describe the 
mutation events are ergodic and time-reversible with respect to a hxed stationary distribution 
TT, that is 



and 

V(, 


3 











for all and e G E(T). 

The phylogenetic likelihood is computed as follows given a set of (aligned) observed 
sequences tjj = ^ of length S over N taxa of a tree r. First orient 

the edges of r away from an arbitrarily chosen root p of the tree. (We can choose the 
root arbitrarily since each P^. is reversible with respect to tt.) Each site i in the sequences 
determines a labeling ipi of each leaf by a state in hi. An extension a of a labeling ipi is an 
assignment of states to all of the nodes in the tree that agrees with ip on the leaves. 

The probability of an extension a given the vector of branch lengths t = {te)e&E{T) is 
dehned to be the probability of the state at the root (given by the stationary distribution) 
multiplied by the probabilities of all the state transitions (including self-transitions) across 
each branch in the tree 

P(a\t) = wia,) n 

{u,v)(iE{T) 

where a„ denotes the assigned state of node u by a. 

The likelihood of the data at site i is then the marginal probability over all the extensions 

pia\t). 

a extends ip 

We further assume, as is standard, that evolution is independent between sites. This implies 
that the likelihood of a set of sequences evolving is just the product of the probabilities for 
the individual sites 

S' 

^(^|t) = JJP(^/’*|t). 

In summary, the likelihood of observing ip given the tree topology r and the vector of branch 
lengths t = (te)e£E{T) has the form 

s 

nv'it)=nz’^w n 

s=l a {u,v)£E{T) 

where a ranges over all extensions of ip to the internal nodes of T and a„ denotes the assigned 
state of node u by a. 

For readers familiar with the theory of probabilistic inference on graphical models, the 
likelihood functions studied in this paper can be alternatively described as follows. Consider 
a tree T and let {Xy : v G V{T)} be a collection of random variables indexed by the nodes 
of the tree. For each edge {u,v) G E(T), we dehne the nonnegative potential function 

:= PPp{t). 

We assume that the joint probability distribution p{xv{T)) factorizes over the tree edges: 

P (xviT)) ~ n k(u,v) {Xui Xy , tyyP 

iu,v)£E(T) 

The likelihood functions of interest may then be represented as the marginal probability 
of the observation ip on the leaves of the tree T. This formulation allows us to study 
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the phylogenetic likelihood functions beyond the reversible Markov framework. We will 
investigate partial extensions to this more general case in Section 7, but for the next several 
sections we will focus on the standard phylogenetic setting (in which we can prove the 
strongest results). 

2.2 One-dimensional phylogenetic likelihood functions 

To investigate the one-dimensional likelihood function on one branch cq, we hx all other 
branches, partition the set of all extensions of 'ijj according to their labels at the end points 
of Co, and split E(T) into two sets of edges i^ieft and i^right corresponding to the location 
of the edges with respect to Cq. The likelihood function can be rewritten as a univariate 
function of t, the branch length of Cq: 




s=l ij a<^Ai 




X Pff(f) X 


, eG^/right 


where Aij denotes the set of all extensions of V’ for which the labels at the left end point and 
the right end point of cq are i and j, respectively. We note that some Aij may be empty if 
Co is a pendant edge and the observed value on the corresponding leaf is not i. 

By grouping the products over T'left and i^right as well as the sum over a in a single term 
b^j, we can dehne the one-dimensional log-likelihood function as 


-eo 


(i) = logum = ^log 


S=1 




Such ieoit) are the object of study of this paper. 

For convenience, we will assume that eo has been chosen and will drop the index cq 
hereafter. 


2.3 Evolutionary models 

Throughout the paper, we use the term evolutionary model on state set to refer to a 
collection Ti of (Q, tt) pairs, where tt is a vector of stationary frequencies and Q is a rate 
matrix on that is reversible with respect to vr. If at every edge of the tree r, the matrix- 
frequency pair {Qe, vr) belongs to Ti, we say that r is a tree under evolutionary model Ti. 

We will consider a number of different evolutionary models of DNA sequences. These 
DNA substitution models differ in terms of the parameters used to describe the rates at 
which one state replaces another during evolution and the stationary frequencies: 

• Jukes-Cantor model [6]: this model assumes equal stationary frequencies (vr^ = t^g = 
tit = Tic = 1/4) and equal mutation rates. 
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Felsenstein 1981 model [7]: this is an extension of the Jnkes-Cantor model in which 
stationary freqnencies are allowed to vary. 


• Kimnra 2-parameter model [ 8 ]: this model assnmes eqnal stationary freqnencies, bnt 
distingnishes between the rates of transitions {A -H- G, i.e. from purine to purine, 
or G -H- T, i.e. from pyrimidine to pyrimidine) and transversions (from purine to 
pyrimidine or vice versa). 

Following common usage, we use k to denote the transition/transversion rate ratio and 
write the rate matrix for this model as 


/ - 


Q.= 


2(k + 1) 


(k -|- 2) 

K 

1 

1 


K 

(k -|- 2) 
1 
1 


1 

1 

■(k -|- 2) 

K 



—(k + 2) y 


The special case k = 3 will play a central role in the analysis of this paper. Note that 
the single k parameter in the Kimnra 2-parameter model determines a rate matrix that 
is shared across the tree, while this paper primarily concerns the effect of changing a 
single branch length parameter. 


While the focus here is on DNA models, we emphasize that our theoretical framework 
is capable of analyzing any time-reversible evolutionary model on any state space. In fact, 
we do not assume a uniform molecular clock, or even a single evolutionary model along the 
edges of the tree. 


2.4 Characteristic polynomials of one-dimensional phylogenetic 
likelihood functions 

We will frequently use the following assumption: 

Assumption 2.1. The eigenvalues of the rate matrix Q are equal to 

0 = dob > —di7 > —^27 > • • • > —dr-ij 

for some positive number 7 and non-negative integers di,, d^-i- 

The following remark, whose proof is provided in the Appendix, guarantees that Assump¬ 
tion 2.1 does not affect the generality of our analyses up to an arbitrarily small approximation 
error: 

Remark 2.1. The set of rate matrices Q for a given evolutionary model that satisfy As¬ 
sumption 2.1 is dense in the set of rate matrices under the same evolutionary model. 

Under Assumption 2.1, if we denote the entries of the diagonalizing matrix M and N of 
Q by niij and respectively, then the transition probabilities can computed as 

k 
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By reparametrizing with x := e we can represent these transition probabilities as poly¬ 
nomial functions 

= '^rriikx'^’^nkj. 
k 

Similarly, the log-likelihood function can be rewritten as 

= X]log(A5(a;)) where A5(a;) = y^hl^Pij{x). 

s=l ij 

Hereafter, we will refer to Pij{x) and As(a;) as the transition polynomials of the evolutionary 
model and the characteristic polynomials of the one-dimensional phylogenetic likelihood 
function, respectively. 

As we will see in later sections, this polynomial representation will enable us to exploit 
many algebraic and analytic properties of the likelihood functions. The most noticeable fea¬ 
ture is that one can use the Fundamental Theorem of Algebra to factorize As(a;) as products 
of linear and quadratic polynomials. As a result, the log-likelihood function can be written 
in the form 


S is,i 

= X] X] log(«s,i + /3s,ix) 

s=l i=l 

S is,2 

+ EE \og{fJ,s,i + l^s,iX + U}s,ix‘^) 

s=l i=l 

where r's,i,i^s,i are the (real) coefficients of the quadratic polynomials in the decomposi¬ 
tion of Xg, while as,i,/3s,i are coefficients of the linear terms in the decomposition. 

This enables us to decompose a complicated evolutionary model into smaller modules, 
each of which can be approximated either by a “linear” model (like the binary symmetric 
model) or by a “quadratic” model (like the Kimura 2-parameter model). In Section 3, we use 
this formulation to prove that if the phylogenetic log-likelihood function is essentially linear 
(that is, there are no quadratic terms in the expression), its shape resembles those generated 
by binary models, with a unique stationary point that is also the maximum point. In 
Section 5, we illustrate that this property does not hold for quadratic models by constructing 
a counter-example with the Kimura 2-parameter model. Finally, in Section 6, we use this 
formulation once again to prove that the space of all rescaled and translated one-dimensional 
phylogenetic likelihood functions under the Kimura 2-parameter model is dense in the space 
of all continuous functions on [0, cx)) with hnite limits. 


3 Uniqueness of the stationary point 

In this section, we discuss a condition under which the uniqueness of the stationary branch 
length is guaranteed. 

The analyses in this section stem from two observations: 
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If for every site index s, the characteristic polynomial has no non-real root, then the 
likelihood function can be decomposed into smaller modules, each of which resemble a 
binary model. 

2. The likelihood functions of binary models and summations of such models are incave. 

Definition 3.1 (Hanson [10]). A vector-valued function f is said to be incave in M” if there 
exists a vector-valued function (j){t, u) such that 

fit) - /(«) < 4>it, u) ■ V/(m), Vt, ueW^ 

where V/ denotes the gradient of f. 

Incave functions were introduced in the optimization literature as a generalization of 
concave functions[10] . It can be proven that a function is incave if and only if every stationary 
point is a global maximum [11], We are interested in the case of functions of a single real 
argument, for which the following result also holds: 

Lemma 3.1. If f is a real-valued incave function with a finite number of stationary points, 
then f has at most one stationary point. Moreover, if such a point exists, it is also a global 
maximum. 

Proof. Denote H = {f G [0, cxd) : f'{t) = 0} and assume that A has more than one element. 
Since A is hnite, we can choose two elements ti and t 2 in A such that the interval (^ 1 ,^ 2 ) C 
M — H. Since / is incave, every stationary point of / is a global maximum. We deduce that 
ti and ^2 are both global maxima of / and /(H) = f{t 2 )- Using the mean value theorem, 
there exists t G (^ 1 ,^ 2 ) such that f'{t) = 0. This is a contradiction. □ 

This enables us to prove the following theorem. 

Theorem 3.1. If for every site index s, the polynomial Xg has only real roots, then i has at 
most one stationary point. Moreover, if such a point exists, it is also a global maximum. 

Proof. Since has only real roots, it can be written as product of linear functions 

dp 

^s{x) = JJ +/5s,ix) 

i=l 

where dp, dehned in Assumption 2.1, is the degree of the polynomial A*. 

The log-likelihood function i can be computed as 

= ^log(A,(e-^*)) 

S /dp 

= log ( n 

S = 1 \2=1 

S dp 

= log(«s,i + /3s,ie~^^). 

s=l i=l 

8 


l{t) 


For any t,u > 0, we have 


i{t) - 


< 


f oisi + (dsiC 

S =1 2=1 \ ) / 

S dp 

EE ( 

S =1 2=1 

' as,i + fds,ie \ 


m dp 

EE ( 

S =1 2=1 


^ as,i + / 

i ( l - e - 

7 

S dp , 

S=1 2=1 ^ ’ 


Ps,i 7e 


—'yu 


3—7ii 


- (1 

7 


Hence, i is an incave fnnction. 

Fnrthermore, since A^, are polynomial and is a bijective map from [0,oo) to (0,1], 
we dednce that £{t) only has a hnite nnmber of stationary points. Using Lemma 3.1, we 
conclnde that i has at most one stationary point; moreover, if snch a point exists, it is also 
a global maximnm. 

□ 


We note that Theorem 3.1 imposes a condition on the characteristic polynomials rather 
than the evolntionary model, and can be applied to assess the nniqneness of the stationary 
point of any time-reversible evolntionary model satisfying Assnmption 2.1. In fact. The¬ 
orem 3.1 does not assnme a nniform molecnlar clock, or even a single evolntionary model 
along the edges of the tree. However, it is worth noting that for the class of models on which 
the rate matrices have only one non-zero eigenvalne, the result automatically holds: 

Corollary 3.1. For binary, Jukes-Cantor and Felsenstein 1981 models, the one-dimensional 
likelihood function has at most one stationary point; if such point exists, it is the global 
maximum. 

We also note that the results in previous studies about the number of maxima of likelihood 
surfaces [2, 3, 4] are derived for binary models. Theorem 3.1 complements those results in the 
sense that while the likelihood surfaces considered in those work may have multiple (or even 
a continuum of) local maxima, the stationary points of one-dimensional likelihood functions 
are still unique. 

This corollary also extends and clarihes a result from the hrst attempt to investigate 
the shape of the one-dimensional phylogenetic likelihood functions [5]. By studying the 
location of the solutions of phylogenetic likelihood functions, the paper proves that one¬ 
dimensional phylogenetic likelihood functions have unique stationary points under the same 
model assumptions as Corollary 3.1. 

This result also provides a full characterization of one-dimensional likelihood functions 
of binary models (and those considered by Corollary 3.1). Indeed, since the derivatives of 
log-likelihood functions are continuous with at most one zero, this result implies that: 
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1. If there is no stationary point, then i{t) is a monotonic function (either strictly de¬ 
creasing or strictly increasing). 

2. If the stationary point to exists and is unique, then the function is increasing in the 
interval (0,to) and is decreasing in {to, oo). 

This simplicity of the shapes of phylogenetic likelihood functions provides a strong the¬ 
oretical foundation for the use of simple optimization methods to locate the maximum like¬ 
lihood branch length. However, we emphasize that these results are only about one-dimen¬ 
sional phylogenetic likelihood functions and do not mean that there is a unique (multivariate) 
stationary point of the likelihood surface or that simple hill-climbing methods will hnd this 
optima. 

4 Algebraic structures on the space of all logarith¬ 
mic relative frequency patterns under the Kimura 
2-parameter model 

While Section 3 provides a uniqueness result for the maximum likelihood branch lengths 
under three simple models, the result does not extend to more general models. In fact, as 
we will illustrate in the next section, the shapes of likelihood functions under the Kimura 
2 -parameter model [8] can be quite complicated, for example with multiple local and global 
maxima. 

In order to enable theoretical analyses of phylogenetic likelihood functions under more 
complex evolutionary models, here we introduce the concept of conditional logarithmic fre¬ 
quency patterns and study the algebraic structures on the space of such patterns. 

Definition 4.1. Given a rooted tree r with root p and N taxa, some labelings '0=('0i) • • • i V's) ^ 
qNxS and a vector of real constants {ci,, cs) we define the logarithmic relative 

frequency pattern 0(r, ^jJ, c) as the r x S matrix with entries 

<pi,s =Cs + log n ^ZaStuv) 

{u,v}£E(t) 

for i E Q, s = 1,..., S and Zi^g being the set of all extensions a of fig to all the nodes of r 
such that a{p) = i. 

For convenience, we will use the shorter term freguency pattern to refer to a logarithmic 
relative frequency pattern. 

In probabilistic terms, for a hxed site index s, the (f,s)-entry of a logarithmic relative 
frequency pattern 0(r, fj, c) is (up to a constant Cg) the logarithm of the likelihood of observ¬ 
ing state i at the root of the tree, given leaf states -fig. This dehnition is directly related to 
the formulation of the characteristic polynomials A^, whose coefficients are the product 
of the probabilities of observing state i and j at the two end points of an edge, given that 
the labeling fig is observed at the taxa. It is straightforward to verify that for models with 
uniform stationary distribution on a hxed tree, we have 

\ogblj = (j)i,s{Tl) + + Cg 
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for all i,j, s, where c* is a constant depending only on s, and Ti and T 2 are the trees obtained 
by removing the edge cq from the tree r and rooting the newly created trees at the endpoints 
of Co (see the proof of Theorem 4.1 in the Appendix for more details). 

Hence, to characterize the space of all phylogenetic characteristic polynomials under a 
given evolutionary model, we just need to characterize the space of all possible logarithmic 
relative frequency patterns under that model. 

Definition 4.2. We denote the space of all possible logarithmic relative frequency patterns 
under the Kimura 2-parameter model by 

G = {0(r, ^jJ,c) : T e e Tf, c G 

where T denotes the set of all rooted trees and Tf denotes the set of all tuples ('^ 1 ,... ,'^ 5 ) 
of S labelings of the taxa ofr. 

The goal of this section is to establish that for any sequence of S column vectors 
Vi,V 2 , ■ ■ ■ ,vs in there exists a tree r under the Kimura 2-parameter model, labelings 
fj = ('^1, 'ip2, ■ ■ ■, 'f’s) of its taxa and a vector of real constants c such that 

= [vi V2 ... Vs]. 

The existence of such tree is guaranteed indirectly by proving that under the Kimura 
2 -parameter model: 

1 . G is an algebraic subgroup of -1-). 

2 . G is a linear subspace of 

3. G is equal to itself. 

Noting again that the stationary distribution of the Kimura 2-parameter model is the 
uniform distribution across states tt = (1/4,1/4,1/4,1/4), the hrst two steps are conhrmed 
by the following theorem. 

Theorem 4.1. If the stationary frequency of the evolutionary model is the same for every 
state, then the following properties hold: 

1. (G, -I-) is a subgroup of -|-). 

2. G is path-connected. 

3. G is a linear subspace ofM.^^^. 


Sketch of proof. A detailed proof of this Theorem is provided in the Appendix, but the 
main arguments can be simply illustrated. The fact that G is closed under addition follows 
because we can add two frequency patterns just by gluing the roots of the two corresponding 
trees, labeling the taxa of r correspondingly and taking the pattern at the new root (Figure 
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Figure 2: G is closed under addition: we can add two frequency patterns [X] and [F] just 
by gluing the roots of the two corresponding trees, labeling the taxa of r correspondingly 
and taking the pattern at the new root. 


TC 




Figure 3: G is path-connected: We can connect any two patterns [X] and [F] by adding a 
new root p, joining it with the roots of the two corresponding trees with two new edges of 
length t and 1/t, respectively, and making p the root of r. 


12 









2). Similarly, we can create the inverse of a pattern by gluing all permuted versions of its 
corresponding tree (with an appropriate vector of real constants). 

To prove that G is path-connected, given two arbitrary trees with roots pi,p 2 , we create 
a new tree by adding a new root p, joining pi,p 2 with p by two new edges of length t and 
1/f, respectively, and making p the root of r (Figure 3). By varying t continuously from zero 
to inhnity, we can make a continuous path in G that connects the two frequency patterns. 
Since any path-connected subgroup of M" is a linear subspace [12], so is G. 

□ 

We note that although the aforementioned arguments are made for the Kimura 2-parameter 
model, which describes a model of DNA evolution (r = 4), Theorem 4.1 only requires that 
the stationary frequency of the evolutionary model is the same for every state. Hence, this 
result also extends to models with more parameters. 

Similarly, the fact that (G,+) is a subgroup of (M''^'^,-f) can be established under the 
assumption that the root distribution vr is uniform, without assuming that it is the stationary 
distribution of the evolutionary process. However, our current approach requires the uniform 
root distribution to be the stationary distribution for the proof of path-connectivity of G, 
and an alternative approach to the proof of path-connectivity will be needed if we want to 
extend the analyses to a more general framework. 

Recalling that the Kimura 2-parameter model corresponds to the uniform stationary 
distribution and a family of rate matrices indexed by k, the transition/transversion rate 
ratio, we then establish that when k = 3, the space of all frequency pattern G = The 

proof is done through proving by induction that G contains 4x5* independent frequency 
patterns (also proven in the Appendix): 

Theorem 4.2. The set of all possible logarithmic conditional frequency patterns with S sites 
under the Kimura 2-parameter model with k = 3 is equal to 

With those results, we hnally can establish the main theorems of the section. 

Theorem 4.3. For any sequence of column vectors Ui, V 2 , ■ ■ ■ ,vs in there exists a rooted 
tree r under the Kimura 2-parameter model with n = 3, S labelings '0i, '02, • • •, V's of its taxa, 
and a vector of real constants c such that 

0 (r,V>,c) = [ui V 2 ...vs]. 


While Theorem 4.3 provides a theoretical guarantee about the existence of a tree under 
the Kimura model with a given frequency patterns, the proof is not constructive. This raises 
some concerns about the practicality of the approach. For example, one can not derive an 
estimation of the number of edges required to produce a given frequency pattern. Those 
concerns are addressed by the following theorem. 

Theorem 4.4. A tree as in Theorem 4.3 can be constructed with at most 64S' edges. 

Not only does the theorem provide an upper bound on the number of edges required to 
construct a tree with a given frequency pattern, its proof also provides a simple algorithm 
to construct such a tree. 
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Proof of Theorem 4.4. The main steps of the proof are as follows: 

Step 1. As shown in the Appendix, any frequency pattern of the form [a:, 0,0,0]* can be pro¬ 
duced (up to a real constant ci) by a tree r with 4 edges and some labeling -0 of its 
taxa. 

Step 2. Using r from Step 1, we create a tree r' of 16 edges by gluing the roots of 4 different 
versions ri, T 2 , ts, of r together and dehne S labelings of r' as follows. 

— For s = 1, we copy the labeling of r onto r'. 

V’i(a) = ^p{a) 

for each taxon a of ti,T 2 ,T 3 , 

— For all s > 2, the labelings are dehned as follows: 

-05 (a) = a-* (' 0 (a)) if a is a taxon of tj 

where a is the permutation {A G T C) in cycle notation. 

The construction of r' is similar to the construction of the inverse of elements in the 
group G in the proof of Theorem 4.1. Because of symmetry, for s > 2, the frequency 
pattern corresponding to site s at the root of the newly created tree will be the same 
for every state while for s = 1 , the frequency pattern of r' is obtained by multiplying 
the frequency pattern of r by a factor of 4. 

We deduce that the pattern created by (r', {'0*}) is: 


/ Ax 

0 . 

• ° ^ 


( Cl 

C2 • 

• Cs ^ 

0 

0 . 

. 0 

+ 

Cl 

C2 ■ 

■ Cs 

0 

0 . 

. 0 

Cl 

C2 ■ 

■ Cs 

V 0 

0 . 

. q) 


V Cl 

C2 ■ 

■ cs / 


for some real constants Ci, C 2 ,..., C 5 . 

Step 3. By similar arguments, for any i = 1,2, 3,4 and s = 1, 2,..., S', we can construct a tree 
of 16 edges for any patterns with S sites whose only non-zero entry is at the (i, s)- 
position. Hence, it takes 16 x 4S = 64S edges to construct a tree with an arbitrary 
given frequency pattern. 

□ 

5 Non-uniqueness of stationary points: Kimura 2-paranieter 
model 

In this section, we provide an example for which there are multiple stationary points of the 
likelihood function. To construct such an example, we hnd two polynomials pi{x) and p 2 {x) 
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with coefficients 6 i, 62 such that the product piP 2 has 2 local maxima in [ 0 , 1 ], and pi and p 2 
can be expressed as positive linear combination of the basis polynomial functions Pi derived 
from an evolutionary model (as will be carefully described in this section). This gives a 
counter-example with S = 2 sites. 

Consider the Kimura 2-parameter model with k = 3 which has the rate matrix 


-5/8 

3/8 

1/8 

1/8 A 

3/8 

-5/8 

1/8 

1/8 

1/8 

1/8 

-5/8 

3/8 

1/8 

1/8 

3/8 

-5/8 


This matrix has eigenvalues 0 > —7 > —27 where 7 = 0.5. The transition probabilities 
under this evolutionary model can be computed explicitly by 


Pi(t) = 0.25 -|- 0.25exp(—0.5t) -|- 0.5exp(—t) 

P 2 {t) = 0.25 -|- 0.25 exp(—0.5t) — 0.5exp(—t) 

P^(t) = Pi{t) = 0.25 - 0.25exp(-0.5f) 

where Pi(t), P 2 (t), Ps(t), P^it) are the probabilities of transitioning from state A to state 
A, T, G, C, respectively. This simple model is “universal” in an appropriate sense as shown 
in the end of the paper. 

This leads to a representation of the likelihood as the product of two different linear 
combinations of the transition polynomials 

Pi{x) = 0.25 + 0.25x + 0.5x^ 

P 2 {x) = 0.25 + 0.25a; - 0.53;^ (5.2) 

p^(x) = Pi{x) = 0.25 - 0.25a; 


where x = exp(—0.5t). 

We assume that the likelihood is computed by observing two sites si and S 2 , and that 
the edge of interest e is a pendant edge with the observed values at that tip being A for both 
sites. Assume further that the state observation probabilities at the inner node of the edge 
e are provided by 


lA = [0.24977275, 0.34067358, 0.2051904, 0.20436327] 

and 

= [0.25,0.16087344,0.29328435,0.29584221]. 

As discussed earlier, the log-likelihood function can be computed as 


lit) = log(Ai(t)) -h log(A 2 (t)) 


(5.3) 


where 

4 

i=l 
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Log-likelihood functions with multiple stationary points 



X = exp( — 0. bt) 

Figure 4: The log-likelihood (5.3) as a function of x = exp (—0.5f) for various values of the 
coefficients of the characteristic polynomial. 


Plots of the log-likelihood function ^ and its perturbations (by varying the coefficients 
slightly) in terms of x and t are provided in Figure 4 and Figure 5, respectively. The hgures 
show that i has three stationary points (two local maxima at ti < t 2 and one local minimum), 
all in the interval [0,1]. The fact that i{ti) > £(^ 2 ) for some cases and i{ti) < £(^ 2 ) for some 
others indicates that there exist some values of 6 | such that i{ti) = ^(^ 2 ), he. the smoothly 
varying likelihood function can even have two global maxima. 

We note that these examples can be achieved under the assumption that given any pos¬ 
itive coefficients of the inner node, we can hnd some trees under the Kimura 2-parameter 
model with these precise coefficients. This assumption is conhrmed by the following result, 
proven in the Appendix. 

Theorem 5.1. For every set of positive coefficients rjl, there exist a phylogenetic tree r and 
S labelings V’l, V’ 2 , ■ ■ ■ ,'f’s of the taxa such that for some edge e in r, the one-dimensional 
likelihood function on e under the Kimura 2-parameter model with k = 3 satisfies 

i{T, t) = Co + log I ^ VtPi{t) 

s=l V i 

where Pi{t) is the probability of transition from state A to state i and Cq is a constant. 
Moreover, such a tree r can be constructed with at most bdS* -f 1 edges. 
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Log-likelihood functions with multiple stationary points 



Figure 5: The log-likelihood (5.3) as a function of branch length t for various values of the 
coefficients of the characteristic polynomial. 


In our examples, the upper bound on the number of edges to produce the given frequency 
pattern is 64 x 2 -f 1 = 129 edges. 

Remark 5.1. While the algorithm to construct a tree given a freguency pattern given by 
Theorem 4.4 always outputs a star-tree (a tree without internal edges), we note that 

1. We can approximate any star tree by resolved trees with arbitrary precision. 

2. The maximum number of stationary points of a polynomial of degree four is 3, hence 
small perturbations on the coefficient of a polynomial of degree four with three stationary 
points do not change the number of stationary points. 

We deduce that there are resolved trees for which the one-dimensional likelihood function 
on certain edges have multiple maxima. 

Since a resolved tree with n taxa has 2n — 3 edges, the upper bound on the number of 
edges of a resolved tree for which the one-dimensional likelihood function on certain edges 
has multiple maxima is 2 x 129 — 3 = 255 edges. 
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6 Universality and complexity of the Kimura 2-parameter 
model 

As we discussed earlier in the paper, the main idea behind the results in Section 3 and Section 
4 is that by using the Fundamental Theorem of Algebra, we can decompose a complicated 
evolutionary model into smaller modules, each of which can be approximated either by a 
“linear” model or by a “quadratic” model. This paradigm focuses on the branch lengths of 
the tree and is independent of the state space hi of the evolutionary model, which provides 
a way to represent advanced evolutionary models (amino-acid models, codon models) by 
simple ones (nucleotide models). 

This motivates the problem of constructing a complete characterization of one-dimen¬ 
sional likelihood functions. The main question is: does there exist an evolutionary model 
that can represent all one-dimensional likelihood functions of any time-reversible evolution¬ 
ary model? 

Such a model M, if it exists, and which we will refer to as a universal model, needs to 
satisfy the following two conditions: 

1. All one-dimensional likelihood functions under any reversible evolutionary model can 
be written as a product of polynomials, each of which is a positive linear combination 
of the transition polynomials of A4. 

2. For every set of positive coefficients bfj, there exists a phylogenetic tree r and S labelings 
- 01 , - 02 , ■ ■ ■ y'lps oi the taxa such that for some edge e in r, the one-dimensional likelihood 
function on e under the A4 satisfies 

<’(r,()=C'i + 5^1og(X]'’’A'W 

s=l \ ij 

for some constant Ci. 

In this section, we will prove that the Kimura 2-parameter model with k = 3 is, in fact, 
a universal model. The key components of the proof are Theorem 5.1, the Fundamental 
Theorem of Algebra and the fact that the transition polynomials of the Kimura 2-parameter 
model effectively span a large class of linear and quadratic polynomials. 

6.1 Universality of the Kimura 2-parameter model 

We hrst make the following observation, proven in the Appendix. 

Lemma 6.1. If f is a real-coefficient polynomial that satisfies 

1. f is positive on [0,1], 

2. deg f = 1 or f is a quadratic polynomial with no real root, 

then f can be written as positive linear combination of the transition polynomials of the 
Kimura 2-parameter model if and only if 
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deg / = 1 and /(—I) > 0 , 
or 

2. deg / = 2 and f has no root inside the set 

B = {z E C : \z + 1\ < 1 or l^: — 1| < V^}. (6.1) 


This enables us to establish the universality of the Kimura 2-parameter model. 

Theorem 6.1 (Universality). If L is a one-dimensional phylogenetie likelihood function of 
a tree under an arbitrary time-reversible model that satisfies Assumption 2.1, then up to 
translation and rescaling, L is egual to a one-dimensional likelihood under the Kimura 2- 
parameter model. 

That is, there exist ci, C2, C3 > 0 such that 


L{t) = C 2 TK 2 p(r,V’,C 3 t) - Cl, Wt E [0,oo), 

where Lk 2 p(t, "05 ■) is the one-dimensional likelihood function under the Kimura 2-parameter 
model on some edge of a tree r with labeling fj. 

Proof. Assumption 2.1 implies that the function 

Cix) ■= L (— logx ) 

V 7 / 

is a polynomial in x for some 7 > 0. Since C is continuous and the set B defined by (6.1) is 
compact, if we define 

Cl = 1 -h sup \C{z)\, 

zeB 

then by the triangle inequality, the polynomial C{x) ci has no root in B. 

By the Fundamental Theorem of Algebra, the polynomial C{x) -t- ci can be written as 

s 

C{x) + Cl = JJfi's(a:), 

s=i 

where each ps is either a quadratic polynomial with no real root, or a polynomial of degree 1 . 
Moreover, each ps is positive on [0,1] and has no root in B (which also implies Ps{—1) > 0 if 
deg ps = 1). Lemma 6.1 implies that each ps can be written as a positive linear combination 
of the transition polynomials of the Kimura 2-parameter model 

9six) = ^biPijix). 
ij 

We deduce that ^ 

log(£(a:) -t- Cl) = ^ log I ■ 

s=l \ ij / 
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We recall that the Kimura 2-parameter model has symmetries such that any transition 
probability Pij{t) is in fact equal to PAiit) = Pii^) for some 1. Therefore, by grouping 

vt--- E ''t. 

i,j-Pij=PAl 

we have 

log(£(x) + Cl) = ^ log ■ 

Also, the characteristic polynomial for the Kimura 2-parameter model (5.1) with k = 3 
is parameterized by a; = exp(—0.5t) such that the one-dimensional likelihood Lk 2 p(t, V’,^) 
satisfies 

LK 2 p{T,'il:,t) = £K2p(r,'0,exp(-O.5f)). 

Now, Theorem 5.1 guarantees that there exists a tuple (r, ■0) under the Kimura 2- 
parameter model on an edge of the tree such that 


log CK 2 p{r,'ip,x) 


for some positive constant C 2 . 
In other words, we have 


s 

log C 2 -h ^ log 

s=l 



C{x) = C 2 CK 2 p{r,'ip,x) - Cl, Vxe(0,l]. 


Hence, 


or 



C2^K2p(T,^,-21ogx) - Cl, Vxe(0, 1], 


L{t) = C2LK2p{r,^|J,C3t) - Cl, C3 = 7/2, Wte[0,oo). 


That is, up to translation and rescaling, L is equal to a one-dimensional phylogenetic likeli¬ 
hood function under the Kimura 2-parameter model. □ 


Since the set of rate matrices for a given evolutionary model that satisfy Assumption 2.1 is 
dense in the set of all possible rate matrices under the same evolutionary model (Remark 2.1), 
we also have the following corollary. 

Corollary 6.1. Any one-dimensional phylogenetic likelihood function under an arbitrary 
time-reversible evolutionary model can be uniformly approximated with arbitrary precision by 
(rescaled and translated) one-dimensional phylogenetic likelihood functions under the Kimura 
2-parameter model. 

We also note that the rescaling and translation constants in the statements of Theo¬ 
rem 6.1 can not be removed: Lemma 6.1 indicates that some polynomial function can not 
be represented exactly as a Kimura 2-parameter likelihood function. For example, one of 
the transition polynomials of the Jukes-Cantor model is 


J(x) = 0.25 + 0.75X 
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which has J(—1) < 0. For this reason, some likelihood functions under the Jukes-Cantor 
model may not be represented exactly by the Kimura 2-parameter model without adjusting 
by an additive constant. 

6.2 Complexity of the Kimura 2-parameter model 

The universality results in the previous section can be adapted easily to analyze the set of all 
one-dimensional phylogenetic likelihood functions under the Kimura 2-parameter model. The 
following complexity results imply that one-dimensional likelihood functions under advanced 
evolutionary models can be more complex than it is typically assumed by phylogenetic 
inference algorithms. 

First, it is straightforward to check that Theorem 6.1 still holds (without changing the 
proof) if we replace the one-dimensional phylogenetic likelihood function L with an arbitrary 
polynomial P in x = exp(— yf) for some 7 > 0 and relax Assumption 2.1. Moreover, if P 
is of degree n, then by Theorem 5.1, it can be represented by a one-dimensional likelihood 
function of a tree with at most (64?7, -|- 1) edges with respect to some n-site labeling of its 
taxa. 

Corollary 6.2. Given an arbitrary polynomial P of degree n and 7 > 0, then up to transla¬ 
tion and rescaling, P(exp(—yt)) is egual to a one-dimensional likelihood under the Kimura 
2-parameter model on a phylogeny with at most QAn -|- 1 edges. 

This corollary indicates that by increasing the number of sites and the size of the tree, 
we can obtain likelihood functions shaped like an arbitrary polynomial in the interval [ 0 , 1 ]. 
For example, given an arbitrary finite sequence ti,t 2 , ■ ■ ■ ,tk G (0, 00 ), we can construct a 
polynomial Pk that peaks precisely at Xk = exp(—0.5tfc) and use Corollary 6.2 to obtain the 
following result. 

Corollary 6.3. Given an arbitrary finite seguence ti,t 2 ,...,tk G (0, 00 ), there exists a 
phylogenetic tree r and some labeling of its taxa such that for some edge of the tree, the 
one-dimensional likelihood function under the Kimura 2-parameter model peaks precisely at 
tl,t2, . . . ytk- 

Furthermore, since rescaling and translation do not change the relative order of the likeli¬ 
hood values at the stationary points, we can make any of the U’s (or all of them) the function’s 
global maxima. 

Finally, we can replace the phylogenetic likelihood functions in Corollary 6.1 by an arbi¬ 
trary continuous function / with finite limit to obtain the following density result. 

Corollary 6.4. The space of all rescaled and translated one-dimensional phylogenetic likeli¬ 
hood functions under the Kimura 2-parameter model is dense in the space of all non-negative 
continuous functions on [ 0 , cxd) with finite limits. 

Proof. Let / be a continuous function on [0, 00 ) with finite limit. Define 

9{x) = /(- log(x)) Vx G (0,1], 
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then g{x) can be extended continuously to [0,1]. By Weierstrass’s theorem [13], there exists 
a sequence of positive polynomials {Pn} such that 

sup \Pn{x) - g{x)\ -)■ 0. 
xe[o,i] 


This implies that 

sup |Pn(exp(-t)) - f{t)\ -)■ 0. 

tS[0,oo) 

On the other hand, we deduce from Corollary 6.1 that P„(exp(—f)) is, up to rescaling 
and translation, a one-dimensional likelihood under the Kimura 2-parameter model. This 
completes the proof. □ 

7 Non-reversible Markov models of evolution 

As we mentioned in Section 2 , the analyses in the previous sections can be described in the 
more general framework of probabilistic inference for graphical models. In this framework, 
the likelihood function can be be dehned as the marginal distribution on the leaf nodes of a 
joint probability distribution that factorizes over the edges of the tree via the non-negative 
kernels (also referred to as potential functions) ke{i,j,t). The one-dimensional phylogenetic 
likelihood functions can be obtained by hxing all but one branch length. 

In this section, we briefly analyze the extent to which our analyses of one-dimensional 
likelihood functions are valid in this more general setting. As we illustrate below, the results 
in this section do not assume the reversibility of the kernels and thus apply for non-reversible 
models of evolution. However, we need to modify our assumptions accordingly. 

Several parts of our analysis rely on the core assumption that the kernel functions need to 
be polynomials of a: = exp(—yf) for some 7 > 0. Thus we require the following assumption, 
which is the equivalent of Assumption 2.1 but in a more general setting. 

Assumption 7.1 (Polynomial representation). There exists a constant yg > 0 and polyno¬ 
mials {x) such that 

^e(bj,^) =P*/(exp(-yef)) Vf, 

for all i,j & Q and e G E{T). 

This assumption implies that the limit of ke{i,j,t) for large t exists, that is, the kernels 
are stationary. We also note that using the density results for Bernstein’s polynomial ap¬ 
proximation (see, for example, [13]), any non-negative continuous function on [0, cx)) with 
hnite limit can be approximated with arbitrary precision by some kernels that satisfy As¬ 
sumption 7.1. 

Under this assumption, the characteristic polynomials As(a;) can be dehned in a similar 
manner and the result in Section 3 (Theorem 3.1) is still valid. 

Theorem 7.1. Under Assumption 7.1, if for every site index s, the polynomial Xg has 
only real roots, then one-dimensional likelihood function has at most one stationary point. 
Moreover, if such a point exists, it is also a global maximum. 
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While Section 4 is specifically developed to analyze the Kimura 2-paranieter model, the 
logarithmic relative frequency pattern can be extended easily by replacing the transition 
probabilities across the edge e by the kernel ke and by setting the distribution of the root by 
the uniform distribution. Building upon this concept, we can study the algebraic structure 
of the space of all frequency patterns and obtain a partial extension of Theorem 4.1 in a 
more general setting as described below. However, the proofs of Theorems 4.2 and 4.3 are 
tailor-made for the Kimura 2-parameter model and are not easily extended to the general 
case. We leave their extension as open problems. 

We do obtain the following partial extension of Theorem 4.1 in the more general setting. 

Theorem 7.2. The following properties hold: 

1. (G, -I-) is a subgroup of ,+). 

2. Assume 

lim ke{i,j,t) = Pei^) = “ fim^e(b = Pe^(l) = ^ij 

t—>-oo r t^o 

for all i,jEfl and e G E{T), where 6ij = 1 if i = j and 6ij = 0 otherwise. Then G is 
a linear subspace ofW^^. 

We recall that the frequency patterns can be defined without the polynomial represen¬ 
tation of the likelihood. Thus, in Theorem 7.2, Assumption 7.1 is not required. We further 
note that for part (1) of the theorem to be valid, we do not need to assume that the kernels 
are stationary. In fact, the only condition required is that the distribution at the root is 
uniform. To provide a proof for path-connectivity of G, however, the conditions about the 
behavior of the kernels at 0 and oo are necessary. 

Polynomial representation of likelihood functions (Assumption 7.1) is needed to extend 
the results of Section 6. By the same arguments as in the proof of Theorem 6.1, we obtain 
an equivalent result. 

Theorem 7.3. If L is a one-dimensional phylogenetic likelihood function of a tree under a 
model whose kernels satisfy Assumption 7.1, then up to translation and rescaling, L is egual 
to a one-dimensional likelihood under the Kimura 2-parameter model. 

8 Conclusions and discussion 

In this work, we investigate the problem of characterizing the shape of one-dimensional 
phylogenetic likelihood functions. Our results classify all evolutionary models into two cat¬ 
egories: 

1. For binary, Jukes-Cantor and Felsenstein 1981 models: the one-dimensional likelihood 
function has at most one stationary point. 
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2. For Kimura 2-parameter model and more advanced evolutionary models: the shape of 
the one-dimensional likelihood function can be much more complex. In fact, the space 
of all rescaled and translated one-dimensional phylogenetic likelihood functions under 
such a model is dense in the set of all non-negative continuous functions on [0, cx)) with 
hnite limits. 

Despite the complexity of the one-dimensional likelihood functions under advanced evo¬ 
lutionary models, we prove that all one-dimensional phylogenetic likelihood function are 
essentially Kimura 2-parameter likelihood functions. This result establishes a strong foun¬ 
dation for the use of the Kimura 2-parameter as the building block of all evolutionary models. 

Our results are based on two novel techniques. First, we introduce and use characteris¬ 
tic polynomial representations of one-dimensional phylogenetic likelihood functions and the 
Fundamental Theorem of Algebra to decompose any evolutionary models into smaller mod¬ 
ules, each of which resembles the Kimura 2-parameter model. Second, we introduce the new 
concept of logarithmic relative frequency patterns and analyze algebraic structures on the 
space of such patterns. These structures open a new way to explore the space of all possible 
likelihood functions. Moreover, by analyzing these structures, we are able to tackle the in¬ 
verse problem of constructing a phylogenetic tree that has a given frequency pattern at the 
root. This enables us to construct phylogenetic trees that approximate any given likelihood 
function with arbitrary precision. 

There are several avenues for improvement. Firstly, while we know that the shape of 
one-dimensional likelihood function can be very complex, it is not clear how frequently mul- 
timodality might be encountered in practice and to which degree it affects the accuracy 
of phylogenetic algorithms. Since the space of high degree polynomials are dominated by 
multimodal functions, one might expect that as the number of sites and the size of the tree 
increase, multimodality becomes more likely. However, since the space of phylogenies is 
known to possess considerable hidden structure which sometimes lead to counter-intuitive 
properties, careful analysis of the space of all rescaled and translated one-dimensional phy¬ 
logenetic likelihood functions under the Kimura 2-parameter model are required to evaluate 
this hypothesis. Secondly, although the focus of this work is on one-dimensional phyloge¬ 
netic likelihood functions, it is possible to utilize the framework we propose to study full 
phylogenetic likelihood functions. This will be a subject for future work. 
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10 Appendix 


Proof of Remark 2.1. If we denote the entries of the diagonalizing matrix M and Y of Q by 
mij and respectively, then 



k 


where —are the eigenvalues of Q. (The eigenvalues are known to be non-positive, so 
are non-negative.) 
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Since the set of rational numbers Q is dense in M"*", we can find r{k,l) G Q"*" such that 
for all k, r{k,l) —)■ as / approaches inhnity. If we dehne 

Q[j = 

k 

then ^ Q element-wise as I approaches inhnity. Since r(/c, 1) are all rational the matrices 
are of hxed hnite dimension, we can also hnd 7^ > 0 and d{k, 1) E N such that r{k, 1) = 
d{k,l)'yi. □ 

Proof of Theorem 4.1. We dehne the equivalence relation ~ on as follows: m ~ n if and 
only if there exists a vector of real constants c = (ci,..., cs) such that for alH = 1, 2, 3,4 
and s = 1,..., S', we have 

If we dehne 

= log ^ Yl PZaS^nv) 

a£Zi^a (u,v)&E{t) 

for i E fl, s = 1,S and Zi^g being the set of all extensions a of 'ifg to all the nodes of r 
such that a(p) = i. 

Recall that the Kimura 2-parameter model has a uniform stationary distribution, for all 
r, -0, c, we have 0(r, fj, c) 

1. (Addition): Consider any two elements xi,X 2 E G. By the dehnition of G and since 
the stationary frequency of the evolutionary model is the same for every state, there 
exist trees Ti,T 2 with ni,n2 taxa and labelings ipi,'if 2 such that 


Xi h{Ti,'ipi), i = l, 2 . 


If we construct a new tree r from ti and T2 by gluing the roots pi,p2 and label the 
taxa of r corresponding to 'ipi,'ip2i then we have 

[ft(r,v-)lM = iog E n n 

a,£Zi^g {u,v)gE{ti) {u,v)gE{t2) 

where each term a in the sum corresponds uniquely to a pair of extensions (a^, a^) of 
'0^,'02 fo the internal nodes of Ti,T2, respectively, such that a^(p) = a^(p) = i. 

Therefore, 

= logE n n 

= [h(ri, V’i)]m + [^('^2, ^2)]i,s 


for alH G n and s = 1 ,..., S'. 


Therefore 

and 


/i(r, fj) ~ h(ri, fji) - 1 - h(r2, -02) 
xi -I- a ;2 ~ h(r, ip) E G 


which implies that G is closed under addition. 
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2. (Inverse): Consider any element x E G and its corresponding representative tree r and 
labeling tjj. For any permutation a of the states, we define the labeling 'ijjcr as 

for every taxon u of T. For example, if a is the permutation {A G T C) in cycle 
notation, then ipa- is obtained from -0 by replacing Ahy G, G by T, T by C and G by 

A. 

Now let (To be a permutation of order r on the state space create r identical copies 
ti,T 2 , ■ ■ ■ ,Tr of the tree r with labelings 'ipo-o: • • •, and glue the root of all 

the trees together with taxon labeling 7 corresponding to the labelings of ti, r 2 ,..., r^. 
Then because of symmetry, the frequency pattern / at the root of the newly created 
tree /i will be the same for every state, i.e., / ~ 0. We deduce that 0 G G and for every 
X E G, there exists y E G such that x + y = 0. 

This property and the fact that G is closed under addition prove that (G, +) is a 
subgroup of +). 

3. (Connectedness): Consider any two elements Xi,X 2 E G and their corresponding trees 

ti,T 2 , labelings and vectors of real constants ci,C 2 . For any a E (0,1), we 

create a new tree T{a) by adding a new root p, joining pi,p 2 by new edges of length 
ti = tan(|Q;),f 2 = l/G, respectively. We make p the root of r and label the taxa of r 
according to t/’i,t/’ 2 - 

Now we note that when o; —)■ 0, we have 

h{T{a),tp) -E h(ri, tpi) + log - 

r 

since the contribution of T 2 becomes stationary (the stationary frequency is 1/r because 
of the model’s symmetry). Similarly, when o; —)■ 1, we have 

h{T{a),'tp) -E h{T 2 , -02) + log -. 

r 

Therefore, the function g{a) = (f){T{a),'ip) can be extended continuously to the closed 
interval [0,1]. By changing c continuously from Ci to log {1/r), varying a continuously 
from 0 to 1, then changing c continuously from log {1/r) to C 2 , we can make a path in 
G that connects xi and X2- 

4. Since any path-connected subgroup of M" is a linear subspace [12], so is G. 


□ 

Proof of Theorem 4.2. Denote by Pi the set of all rooted trees with one edge (which have 
varying branch lengths) and 


H = {0(r, V’, c) : r G "H, V’ = (V’l, ^ 2 , • • •, V’s) G M'^, c G 
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Note that in the context of this paper, trees with different branch lengths (or in other 
words, different valnes of x) are considered as different trees. Thus the set H dehned here is 
non-trivial. 

We have 

( 10 . 1 ) 

where x = exp(—0.5t), j = A, G, T, C, and t is the length of the unique edge of r. 

Let Xi = 1/4, X 2 = 1/2, X 2 , = 3/4. We will prove, by induction on S', that H contains 
4 X S' independent frequency patterns. 

For S = 1, by considering the 4 different patterns (A), (G), (T), (G) at the only leaf 
and the 3 values of x (corresponding to different branch lengths) described above, we can 
create a set of 4 x 3 = 12 different pairs A quick check by computer shows that the 

corresponding frequency patterns generated by those pairs span the whole vector space 
We can achieve similar result for S' = 2 with the patterns (A, G), (G, T), (T, G), (G, A). 

Now assume that for S = n, H contains 4 x n independent frequency patterns of the 
form (10.1). 

For I = A,G,T,C and x G [0,1], we dehne the building blocks 

fii(x) := llogFu(a:) logf!c(2^) logPiT(a:) log P,c(x)] 

/ P.(i.) \ 

W,(x) - P,(i2) , 

V ) 

The induction hypothesis implies that there exist 4 x n independent frequency patterns 
of the form (10.1). This means that for some labelings "01, 'ip 2 -, ■ ■ ■ i 'i’Am the block matrix 

f \ 



\ / 


has maximal rank An, where 

/ R^i{xi) ■ ■ ■ R^^{xi) \ 

Bs := R^i{x2) ■ ■ ■ R^^{x2) 

\ ■ --R^^ixs) J 

For s = 1,..., 4n, we consider all the labelings obtained by appending 'ijjs with one of the 
four nucleotides A, G, T, G. By doing so, we create a set of 48n different frequency patterns. 
We want to prove that the block matrix 




( Bi 

Wa{x) 

B2 

Wa{x) 

Bin 

Wa{x) 

Bi 

Wg{x) 

B2 

Wg{x) 

Bin 

WGix) 

Bi 

Wt{x) 

B2 

WtIx) 

Bin 

Wt{x) 

Bi 

Wg{x) 

B2 

Wg{x) 

y Bin 

Wg{x) 


has maximal rank An + 4. 

Note that this matrix is row-equivalent to 



where each row of V is of the form Ri{xk) — RA^Xk) for i = G,T,C. (This is done by 
subtracting the blocks {Bg Ri{x)) by the block {Bg Ra{x)) then rearranging the row to 
obtain the sub-matrix J at the top-left corner.) 

On the other hand, from the case S' = 1, we have 


rank 


/ WAix) \ 
Wg{x) 
Wt{x) 

\ Wcix) / 


= 4, 


which implies that rank(l/) = 4. Hence, rank(C') = rank( J) -|- rank(l/) = 4n -|- 4. 

We deduce that for every S, the set G of all possible logarithmic conditional frequency 
patterns with S sites under the Kimura 2-parameter model is a linear subspace of 
(Theorem 4.1) that contains 45* linearly independent vectors. This implies that G = 

□ 


Proof of Theorem 4.4 (Step 1). (Any pattern of the form n = [a: 0 0 0] can be produced by 
a tree r with four edges.) 

Denote 

Xl{t) = PAA{t) X2{t) = PAcit) 

X3{t) = PAxit) Xiit) = PAcit) 
we note that in the Kimura 2-parameter model, X 3 {t) = Xi{t). 
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Now consider two trees ti and r 2 , each with one edge, whose branch lengths are t and s, 
respectively. We label the only nodes of ti and T 2 by the patterns '0i = (^) and -02 = (G*), 
and obtain the freqnency patterns /i(t) and / 2 (s) respectively. By gluing the roots of ti (1 
edge) and the “inverse” of the tree T 2 (3 edges)), we obtain a tree T(t, s) with 4 edges whose 
frequency pattern is equivalent to 


fl{t) - /2(s) ~ 



Xi{t)x4{s) 

X4it)x2isy 


X2{t)x4{s) 

a; 4 (t)a;i(s)’ 


On the other hand, we note that for the Kimura 2-parameter model (5.1), 


I = 1 -|- 2exp(—0.5t) 

xyt) 

only admits values in the interval [1,3], while Xi(s)/a; 4 (s) is a continuous decreasing function 
in s that admits all values in the interval [1, cxd). Hence, for every f > 0, there exists a unique 
k{t) >0 such that 

X2it)x4{k{t)) _ ^ 

X4{t)xi{k{t)) 

Moreover, k{t) is a continuous function in t and 


lim k(t) = 00 lim k(t) = kn 
t-s-0 


where ko satishes xiiko)/X 4 {kQ) =3. 
Now if we denote 

9{t) 


xi{t)x4{k{t)) 

X4{t)x2{k{t)) 


then g{t) is a continuous function that satishes 


lim g{t) = 1 lim 5 f(t) = 00 . 

t —^00 t —^0 


We deduce that for a range of t, 


fi{t) - f 2 {k{t)) ~ [log^(t), 0,0,0] 

which admits every patterns of the form [x, 0, 0, 0] with a; > 0. Similarly 

f 2 {k{t)) - flit) ~ [-log^(t), 0,0,0] 

admits every patterns of the form [x, 0, 0, 0] with a: < 0. This completes the proof. 

□ 

Proof of Theorem 5.1. From Theorem 4.1, there exists a rooted tree r, a labeling and a 
vector of real constants c = (ci,..., cs) such that 

Cs + log ^ 7r(*) Yl PZaStuv) =logi7]^i). 

aGZi^g {u,v)&E(t) 
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(Recall that Zi^g is the set of all extensions a of V’s to all the nodes of r snch that a{p) = i-) 
For any t > 0, we create a new tree r(t) by adding an edge e of length t to the root p 
and labeling the additional taxon by the constant vector {A, A, ... ,y4). The log-likelihood 
fnnction on e of r(f) given this taxon labeling is 

n*) = 5^ log [ ^ irw n pzJt^v)p,A{t) 

-5=1 y i a^Zi^s (u^v)^E{t) 

s s 

S=1 S=1 

Theorem 4.4 implies that the tree r can be constrncted with at most 645 edges. Hence, r(t) 
has at most (645 -|- 1) edges. □ 

Proof of Lemma 6.1. We hrst consider the case of linear fnnctions. Assnme that f{x) = 
ax + h snch that / is positive in [0,1]. We dednce that h + a = /(I) > 0 . Hence / can be 
written as 



f{x) = ax + h 

= 2(& - «) Q - + ip + a) (yj + \x - +{b + a) (^^ + ^x + ^x^^ 

= 2(6 - a)P‘i{x) + {b + a)P 2 {x) + {b + a)Pi{x) 

nsing the transition polynomials Ppx) from eqnation (5.2). 

Since {Pi, P2, P3} are linearly independent, we dednce that / can be expressed as positive 
linear combination of Pi, P2, P3 if and only if /(—I) = 6 — a > 0. 

If f{x) is a monic polynomial of degree 2 with no real roots, then / can be written as 

f{x) = x^ — 2ax + a"^ + b‘^ 

= [{a - 1)'^ + b'^ - l]Pi{x) 

+ [{a - If + b^ - 2 ]P 2 {x) 

+2[{a + lf + b^-l]P 3 ix). 


The coefficients are positive if and only if a ± 6i do not belong to B. 


□ 
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